max rank | avg. rank | sentence |
---|---|---|
146 | 66.0909 | I should get what I want and when I want it. |
157 | 57.4000 | It's been there for more than a year, I think. |
174 | 62.0000 | That would be end of that and you know it! |
182 | 88.2308 | We know how we can play and we know how good we are. |
190 | 91.7778 | But I don't think many of them do, really. |
190 | 84.6667 | “But, really, we would take what we could get. |
203 | 76.2222 | There are just not that many of them around. |
206 | 94.1667 | It was a good win for us but we can play better. |
210 | 79.7778 | "You can only get them with me," he said. |
221 | 95.7273 | "You never know what game is going to be your last. |
241 | 136.3750 | It's better being around people like over here. |
243 | 95.9231 | "And I'm going to be president for the next four years, I think. |
254 | 83.3846 | I didn't know if he was going to take me out or what. |
254 | 92.0909 | It was more about who you didn't want to see. |
255 | 97.1538 | "I think we still have a lot of work to do. |
270 | 101.7778 | "You think you know things about her," he said. |
270 | 100.6364 | I think there are still some things they could do better. |
270 | 80.5833 | But I like a lot of the things that we have. |
271 | 74.0000 | "For most people, what we do is who we are. |
275 | 119.2000 | I'm here for the family to support them," she said. |
276 | 87.0000 | "If they say that, I know where that place is. |
282 | 157.2857 | He hit another, then another, then another. |
294 | 109.7500 | They will be without power for some time. |
296 | 91.4444 | "The look was something I had not had before. |
300 | 87.0000 | If we can help some of them, that would be great. |
300 | 71.0000 | It was a great game for him, but that game is over now. |
303 | 108.7500 | So there are a lot of players out there," he said. |
307 | 71.3000 | And that was the last day I was at school. |
307 | 110.0909 | "There is no school, there is no work and no money. |
308 | 129.4167 | "We can play a lot better than we did," he added. |
The maximum word rank of a sentence is by definition the rank of the rarest word in the sentence. If it is low, all words in the sentence are of high frequency. For this reason the table of the sentences with least maximum word number might be of interest. In the table, we see the corresponding sentences with a minimum length of 40 characters.
The over all distribution of the maximum rank in all sentences of the corpus is shown in a diagram with log-scaled x-axis.
The sentences in the table described above are of interest because they are usually easy to understand. The distribution may give insights into the corpus and may give parameters for language comparison.
While the distribution might be deduced from a small corpus, the sentences in the table are rare and a large corpus will give more impressive results.
Table data:
select max(w_id)-100 as m, avg(w_id)-100 as a, s.sentence from sentences s, inv_w i where s.s_id=i.s_id and length(sentence)>40 and i.w_id>100 group by s.s_id order by m limit 30;
Distribution data;
select m, count(*) from (select 100* round((max(w_id)-100)/100) as m from sentences s, inv_w i where s.s_id=i.s_id and i.w_id>100 group by s.s_id) aa group by m;
Explain the distribution, especially the increase in its right part.
4.5.2.2 Average word rank in sentence
4.5.2.3 Sentences consisting of many low frequency words I
4.5.2.4 Sentences consisting of many low frequency words II
4.5.2.5 Sentences consisting of short words only I
4.5.2.6 Sentences consisting of short words only II
4.5.2.7 Sentences consisting of long words only I
4.5.2.8 Sentences consisting of long words only II